FRAMID Transformation

May 28, 2023

Qiushan Tao

Co-leader, FHS-BAP Data Core

summary

The health exam datasets from the Framingham Heart Study (FHS) use 'idtype' and 'id' to identify each person uniquely. However, in some shared datasets, 'framid' is often used for the same purpose. When merging data from both sources, the first step is to create the 'framid' as the primary key. Keep in mind, we expect variable names to be in lowercase in the following programs. But, if they're in uppercase, you might need to make extra adjustments, especially if your software is case-sensitive.

note

The study IDs provided in this tutorial are dummy IDs used for demonstration purposes only and do not represent real data.

get_framid.r
#####################################################################
#  License      : This source code is licensed under the MIT license.
#  Author(s)    : FHS-BAP data core (QT).
#  Release date : TBA
#  Description  : Get the framid from IDTYPE and ID in FHS Data. 
#  Usage        : library(dplyr)
#               : data <- data %>%
#               :     dplyr::mutate(framid = get_framid(idtype, id)
#####################################################################

get_framid <- function(idtype, id) {
  # Check if idtype and id have compatible lengths
  if (!((length(idtype) == 1 & length(id) >= 1) | 
        (length(idtype) == length(id)))) {
    stop("Error: The lengths of 'idtype' and 'id' don't match!")
  }
  
  # Define the id_matrix
  id_matrix <- data.frame(
    idtype = c(0, 1, 2, 3, 7, 72),
    cohort = c('Gen 1', 'Gen 2', 'NOS', 'Gen 3', 'Omni 1', 'Omni 2'),
    adjust_factor = c(0, 80000, 20000, 30000, 70000, 720000)
  )
  
  # Merge idtype and id with id_matrix
  merged_data <- merge(data.frame(idtype = idtype, id = id), 
                       id_matrix, by = "idtype", all.x = TRUE)
  
  # Calculate the framid by adding id and adjust_factor
  return(merged_data$id + merged_data$adjust_factor)
}

The Python function used to retrieve the 'framid'.

get_framid.py
#########################################################################
#  The Python function used to retrieve the 'framid' from idtype and id.
#  NOTE: The study IDs provided in this tutorial are dummy IDs used for
#        demonstration purposes only and do not represent real data.
#########################################################################

import pandas as pd

def get_framid(examdata):
    # Create id_matrix DataFrame
    id_matrix_data = {
        'idtype': [0, 1, 2, 3, 7, 72],
        'cohort': ['Gen_1', 'Gen_2', 'NOS', 'Gen_3', 'Omni_1', 'Omni_2'],
        'adjust_factor': [0, 80000, 20000, 30000, 70000, 720000]
    }
    id_matrix = pd.DataFrame(id_matrix_data)

    # Merge examdata and id_matrix DataFrames
    merged_data = pd.merge(examdata, id_matrix, on='idtype')

    # Check if idtype and id_exam have compatible lengths
    if not ((len(merged_data['idtype']) == 1 and len(merged_data['id']) >= 1) or
            (len(merged_data['idtype']) == len(merged_data['id']))):
        print("Error: The lengths of 'idtype' and 'id' don't match!")
        raise ValueError("Error: The lengths of 'idtype' and 'id' don't match!")

    # Calculate framid
    merged_data['framid'] = merged_data['id'] + merged_data['adjust_factor']

    # Drop unnecessary columns
    merged_data.drop(['cohort', 'adjust_factor'], axis=1, inplace=True)

    return merged_data

###############################################################
# Example usage:
#  (1) Create sample data for demonstration 
#  (2) Apply the get_framid() function to the sample data.
###############################################################
examdata = pd.DataFrame({
    'idtype': [0, 1, 2, 3, 7, 72],
    'id': [1, 2, 3, 4, 5, 6]
})

result = get_framid(examdata)
print(result)

The SAS code used to retrieve the 'framid'.

get_framid.sas
/*********************************************************************
* The SAS code used to retrieve the 'framid' from idtype and id.
* Create sample data for demonstration.
* NOTE: The study IDs provided in this tutorial are dummy IDs used 
*       demonstration purposes only  and do not represent real data.
**********************************************************************/
data examdata;
    input idtype id;
    datalines;
0   1
1   2
2   3 
3   4
7   5
72  6 
;
run;

data id_matrix;
    input idtype cohort $ adjust_factor;
    datalines;
0  Gen_1       0
1  Gen_2   80000
2  NOS     20000
3  Gen_3   30000
7  Omni_1  70000
72 Omni_2 720000
;
run;

data get_framid;
    merge examdata id_matrix;
    by idtype;
    if not ((n(idtype) = 1 and n(id) >= 1) or (n(idtype) = n(id))) then do;
        put "Error: The lengths of 'idtype' and 'id' don't match!";
        _ERROR_ = 1;
    end;
    framid = id + adjust_factor;
    drop cohort adjust_factor;
run;

The Python function used to retrieve the 'framid'.​

The SAS code used to retrieve the 'framid'.​

The Python function used to retrieve the 'framid'.

The SAS code used to retrieve the 'framid'.